fix(mcp): get_chart_sql drops x_axis on echarts_timeseries_* and only renders one query for mixed_timeseries#39865
Conversation
… renders one query for mixed_timeseries - Add `_extract_x_axis_col` helper to normalise the x_axis field (plain string or adhoc column dict) into a bare column name. - In `_build_query_context_from_form_data`, detect echarts_timeseries_* and mixed_timeseries viz types and prepend the x_axis column to the query's columns list so the temporal dimension appears in the generated SQL (was previously dropped because x_axis is stored separately from groupby in form_data). - For mixed_timeseries, build a second query dict from metrics_b / groupby_b and pass queries=[primary, secondary] to QueryContextFactory. Previously only the primary query was built, so the secondary series SQL was silently lost. - Add TestExtractXAxisCol and TestBuildQueryContextTimeseriesAndMixed test classes covering both bug fixes.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #39865 +/- ##
==========================================
- Coverage 64.35% 63.86% -0.50%
==========================================
Files 2569 2583 +14
Lines 134680 136666 +1986
Branches 31254 31502 +248
==========================================
+ Hits 86679 87287 +608
- Misses 46505 47863 +1358
- Partials 1496 1516 +20
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR fixes regressions in the MCP get_chart_sql tool’s query-context construction so that SQL rendering matches how Superset Explore builds queries for ECharts timeseries and mixed-timeseries charts.
Changes:
- Add
x_axisextraction and prepend it to query columns forecharts_timeseries_*andmixed_timeseriesso the temporal axis is not dropped from rendered SQL. - Build and pass two query dicts for
mixed_timeseries(primary + secondary) so both series layers’ SQL is returned. - Improve chart schema validation ergonomics (relax column-name regex constraints, add richer validation error formatting) and expand unit test coverage around these behaviors.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
superset/mcp_service/chart/tool/get_chart_sql.py |
Adds helpers to include x_axis in query columns and emits 2 queries for mixed_timeseries when building QueryContextFactory input. |
superset/mcp_service/chart/validation/schema_validator.py |
Formats Pydantic validation errors into more actionable per-field details and deduped suggestions. |
superset/mcp_service/chart/schemas.py |
Relaxes strict regex constraints on column-like fields (e.g., digit-prefixed / locale chars). |
superset/mcp_service/chart/tool/generate_chart.py |
Expands tool docstring with additional request examples for several chart types. |
tests/unit_tests/mcp_service/chart/tool/test_get_chart_sql.py |
Adds unit tests for _extract_x_axis_col and for correct query building for ECharts timeseries + mixed timeseries. |
tests/unit_tests/mcp_service/chart/test_chart_schemas.py |
Adds tests verifying relaxed column-name acceptance and continued blocking of obvious XSS/SQL-injection attempts (where applicable). |
Comments suppressed due to low confidence (1)
superset/mcp_service/chart/schemas.py:785
- FilterConfig.column removed the regex constraint, but sanitize_column still calls sanitize_user_input() without check_sql_keywords=True. As a result, values containing SQL metacharacters/comments (e.g. ';', '--', '/*') and dangerous SQL keywords are no longer rejected at validation time, which weakens the injection protections compared to the previous regex. Consider enabling check_sql_keywords=True (similar to ColumnRef.sanitize_name) so relaxed column naming stays compatible with locale chars while still blocking SQL injection patterns.
column: str = Field(
...,
min_length=1,
max_length=255,
# No regex pattern: sanitize_name() already blocks XSS/SQL injection;
# many valid column names (digit-prefixed, locale chars, etc.) would
# be rejected by a strict pattern while posing no security risk.
# Use get_dataset_info to find exact column names.
validation_alias=AliasChoices("column", "col"),
)
op: Literal[
"=",
">",
"<",
">=",
"<=",
"!=",
"LIKE",
"ILIKE",
"NOT LIKE",
"IN",
"NOT IN",
] = Field(
...,
description="LIKE/ILIKE use % wildcards. IN/NOT IN take a list.",
validation_alias=AliasChoices("op", "operator", "opr"),
)
value: str | int | float | bool | list[str | int | float | bool] = Field(
...,
description="For IN/NOT IN, provide a list.",
validation_alias=AliasChoices("value", "val"),
)
@field_validator("column")
@classmethod
def sanitize_column(cls, v: str) -> str:
"""Sanitize filter column name to prevent injection attacks."""
# sanitize_user_input raises ValueError when allow_empty=False (default)
# so the return value is guaranteed to be a non-None str
return sanitize_user_input(v, "Filter column", max_length=255) # type: ignore[return-value]
afddf60 to
213fffd
Compare
…s_secondary to reduce complexity Move the nested _make_query_dict helper and the inline mixed_timeseries secondary query logic out of _build_query_context_from_form_data into module-level functions. This brings the McCabe complexity of _build_query_context_from_form_data back within the C901 limit of 10.
213fffd to
d099748
Compare
✅ Deploy Preview for superset-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
…ghten types - _build_mixed_timeseries_secondary: apply time_range_b from form_data to the secondary query dict when present, so charts with an independent secondary time range produce correct SQL via the fallback path - Change groupby_b annotation from list[str] to list[Any] to reflect that _build_single_query_dict columns accept Any items (e.g. adhoc dicts) - Add test: secondary time_range overridden when time_range_b is set - Add test: SQL-expression x_axis (no column_name) returns None - Fix test class docstring: remove stale PR reference
| if (row_limit := form_data.get("row_limit")) is not None: | ||
| qd["row_limit"] = row_limit |
There was a problem hiding this comment.
Suggestion: Query dict construction always reads row_limit and never supports row_limit_b, so the secondary mixed_timeseries query cannot honor its own row limit. This causes incorrect SQL LIMIT behavior for query B. Use the secondary-specific limit when building the secondary query. [logic error]
Severity Level: Major ⚠️
- ❌ Secondary series row limits ignored in get_chart_sql output.
- ⚠️ SQL preview shows incorrect LIMIT for mixed_timeseries B.Steps of Reproduction ✅
1. Configure a `mixed_timeseries` chart whose Explore form_data contains both `row_limit`
and a different `row_limit_b` (per-series secondary row limit). Example persisted
configuration appears in `superset/examples/featured_charts/charts/Mixed.yaml:80-81` where
both `row_limit` and `row_limit_b` are defined.
2. Use the MCP `get_chart_sql` tool for this chart in a context that goes through the
form_data fallback, e.g. request SQL for an unsaved Explore state via `form_data_key` so
`_handle_unsaved_chart_sql` (`superset/mcp_service/chart/tool/get_chart_sql.py:24-56`)
loads the cached form_data and calls `_sql_from_form_data(form_data, chart=None)` at lines
39-56.
3. `_sql_from_form_data` invokes `_build_query_context_from_form_data` (lines 218-305),
which builds the primary query via `_build_single_query_dict(form_data, groupby, metrics)`
and the secondary mixed_timeseries query via `_build_mixed_timeseries_secondary` (lines
196-215). Inside `_build_mixed_timeseries_secondary`, the secondary query dict is
constructed with `qd = _build_single_query_dict(form_data, groupby_b, metrics_b)` at line
212.
4. `_build_single_query_dict` (lines 180-193) always reads `row_limit` from
`form_data.get("row_limit")` and, if set, assigns `qd["row_limit"] = row_limit` (lines
191-192). There is no handling of `row_limit_b`, so both the primary query and secondary
query use the same `row_limit` value. When `ChartDataCommand` executes this `QueryContext`
(`_sql_from_form_data` lines 44-49), the generated SQL for series B uses the primary row
limit instead of the configured `row_limit_b`, making the SQL preview inconsistent with
the chart configuration.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/mcp_service/chart/tool/get_chart_sql.py
**Line:** 191:192
**Comment:**
*Logic Error: Query dict construction always reads `row_limit` and never supports `row_limit_b`, so the secondary `mixed_timeseries` query cannot honor its own row limit. This causes incorrect SQL LIMIT behavior for query B. Use the secondary-specific limit when building the secondary query.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| groupby_b: list[Any] = [raw_b] if isinstance(raw_b, str) else list(raw_b) | ||
| if x_axis_col and x_axis_col not in groupby_b: | ||
| groupby_b = [x_axis_col] + groupby_b | ||
| qd = _build_single_query_dict(form_data, groupby_b, metrics_b) |
There was a problem hiding this comment.
Suggestion: The secondary query for mixed_timeseries is built from the original form_data without translating _b controls to base keys, so secondary-only filters (for example adhoc_filters_b after preprocessing) are never applied. This makes query B SQL use primary filter state and can return incorrect results. Build a secondary form-data view (equivalent to retainFormDataSuffix(..., "_b")) before constructing query B. [logic error]
Severity Level: Major ⚠️
- ❌ get_chart_sql misrepresents mixed_timeseries secondary series filters.
- ⚠️ Unsaved mixed_timeseries SQL omits secondary adhoc_filters_b.Steps of Reproduction ✅
1. Create or edit a `mixed_timeseries` chart in Explore so its form_data (as stored in
`Slice.params` / cached via `GetFormDataCommand`) contains both primary and secondary
series, including `metrics`, `groupby`, `metrics_b`, `groupby_b`, and a secondary-only
adhoc filter in `adhoc_filters_b`. This matches how mixed charts are migrated/built in
`superset/migrations/shared/migrate_viz/processors.py:17-28` where `adhoc_filters_b` is
populated from `adhoc_filters`.
2. From an MCP client, request SQL for this chart using only `form_data_key` (unsaved
state), triggering `_handle_unsaved_chart_sql` at
`superset/mcp_service/chart/tool/get_chart_sql.py:24-56`, which loads the cached form_data
and calls `_sql_from_form_data(form_data, chart=None)` at lines 39-56.
3. `_sql_from_form_data` calls `_build_query_context_from_form_data` (lines 218-305).
Inside that function, adhoc filters are preprocessed by `merge_extra_filters` and
`split_adhoc_filters_into_base_filters` at lines 250-263, but
`split_adhoc_filters_into_base_filters` (`superset/utils/core.py:1387-1400`) only reads
the `adhoc_filters` key, never `adhoc_filters_b`, so secondary-only adhoc filters remain
unprocessed on `form_data` and are not converted into `filters`/`where`/`having`.
4. Still in `_build_query_context_from_form_data`, the secondary query dict for
mixed_timeseries is built by `_build_mixed_timeseries_secondary` (lines 196-215), which
calls `qd = _build_single_query_dict(form_data, groupby_b, metrics_b)` at line 212.
`_build_single_query_dict` (lines 180-193) pulls `filters` and `time_range` from the
shared `form_data`, so the secondary query's `qd["filters"]` is based only on primary
`adhoc_filters` that were split into base filters, while `adhoc_filters_b` is never
applied. When `ChartDataCommand` later runs on this `QueryContext` (`_sql_from_form_data`
lines 44-49), the rendered SQL for query B ignores its intended secondary-only filters.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/mcp_service/chart/tool/get_chart_sql.py
**Line:** 212:212
**Comment:**
*Logic Error: The secondary query for `mixed_timeseries` is built from the original `form_data` without translating `_b` controls to base keys, so secondary-only filters (for example `adhoc_filters_b` after preprocessing) are never applied. This makes query B SQL use primary filter state and can return incorrect results. Build a secondary form-data view (equivalent to `retainFormDataSuffix(..., "_b")`) before constructing query B.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
Code Review Agent Run #a593d2Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
…econdary query The secondary series in mixed_timeseries can have independent row limit and adhoc filters configured separately from the primary series. - _build_mixed_timeseries_secondary now accepts engine and uses it to process adhoc_filters_b via split_adhoc_filters_into_base_filters, replacing the inherited primary filters on the secondary query dict - row_limit_b overrides row_limit on the secondary query dict when set - Tests: row_limit_b, adhoc_filters_b each exercised independently
Code Review Agent Run #58d06cActionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
Summary
Fixes two bugs in
get_chart_sqlintroduced after #38700:Bug 1 — x_axis dropped for
echarts_timeseries_*:For XY/timeseries charts the temporal column is stored in
form_data["x_axis"], not ingroupby. When_build_query_context_from_form_databuilt the query dict it only usedgroupby, so the time axis column was silently missing from the generated SQL.Bug 2 —
mixed_timeseriesrenders only one query:mixed_timeserieshas two independent series layers (primary:metrics/groupby, secondary:metrics_b/groupby_b). Previously only the primary query was built and passed toQueryContextFactory, soget_chart_sqlreturned SQL for only half the chart.Changes
_extract_x_axis_col: new helper that normalisesx_axis(plain string or adhoc column dict) to a bare column name._build_single_query_dict: extracted from an inner function so complexity stays within the C901 limit._build_mixed_timeseries_secondary: new helper that builds the secondary query dict frommetrics_b/groupby_b._build_query_context_from_form_data: detectsecharts_timeseries_*andmixed_timeseriesviz types; prependsx_axisto columns; passesqueries=[primary, secondary]formixed_timeseries.TestExtractXAxisColandTestBuildQueryContextTimeseriesAndMixedtest classes.Testing